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Abstract 

What is the length of the shortest sequence S of reals so that the 
set of consecutive n-words in S form a covering code for permutations 
on {1, 2,...,n} of radius R ? (The distance between two n-words is 
the number of transpositions needed to have the same order type.) 
The above problem can be viewed as a special case of finding a De 
Bruijn covering code for a rooted hypergraph. Each edge of a rooted 
hypergraph contains a special vertex, called the root of the edge, and 
each vertex is the root of a unique edge, called its ball. A De Bruijn 
covering code is a subset of the roots such that every vertex is in some 
edge containing a chosen root. Under some mild conditions, we obtain 
an upper bound for the shortest length of a De Bruijn covering code 
of a rooted hypergraph, a bound which is within a factor of logn of 
the lower bound. 



1 Introduction 

Suppose G is a graph whose vertex set consists of some subset of all n-tuples 
X n over a finite alphabet X . The natural distance metric d(-,-) on this 
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graph allows us, for each nonnegative integer R, to define a "ball of radius 
R centered at x G A"' by 



B(x;R) = {y:d(x,y)<R}. 



One may ask for a subset of the vertices so that their respective balls cover 
the entire graph. Such a set is commonly called a covering code of radius R 
for the graph G. 

Because the vertex set of our graph consists of sequences of symbols, it is 
sometimes possible to find a particularly compact representation of a given 
covering code C. Consider a string S of elements of X of length \C\. We say 
that S is a De Bruijn covering code for G if the set of consecutive n-words 
(with "wrap-around") in S is exactly the set C. Then, instead of writing 
down all of C, we can specify the code with n times the efficiency by using 



Such an object was considered in pQ - the graph was precisely the q- 
ary Hamming cube, q a prime power, where our definition of a covering 
code coincides with the classical one. In particular, the authors asked for 
the length of the shortest De Bruijn covering code of a given radius R, and 
showed that one exists with length given by 



Later, Vu [I] extended these bounds to all q and greatly simplified the proof. 
The upper and lower bounds still do not meet, however, and it is an inter- 
esting question to close this gap. 

It is also natural to ask analogous questions for other sets of sequences 
than all of X n , possibly with an equivalence relation defined on these se- 
quences. One could ask for multisets of size n, permutations, or any Cayley 
graph G defined on a subset of X n . For example, the string 134526 is a 
radius 1 covering code for the permutations on four symbols: indeed, every 
permutation is at a "transposition distance" of at most one from 1234, 2341, 
2314, 3241, 2314, or 4123, the six order-types which occur as consecutive 4- 
words in the string. De Bruijn covering codes for these other types of graphs 
are precisely the subject of this paper. 

Therefore we generalize this idea as follows. A hypergraph is a pair (V, S), 
where V is the set of vertices and £ is a family of subsets of V, called (hy- 
per) edges. A k-uniform hypergraph is one in which every hyperedge has 



S. 




logn 



n 
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cardinality k, and a graph is a 2-uniform hypergraph. Given a set A, a hy- 
pergraph 7i on A is said to be a rooted hypergraph if every edge contains 
a special vertex, called the root of the edge, and each vertex is the root of 
exactly one edge. Given a vertex a £ A, we denote the unique edge of which 
it is the root by d, which we call the ball about a. An endomorphism of a 
rooted hypergraph Ti = (V, £) is a map : H — > so that 0(e) £ £ for 
all e £ £ and 0(a) = 0(a) for all a £ V. An automorphism of H is a bijec- 
tive endomorphism whose inverse is also an endomorphism. H, is said to be 
transitive if its automorphism group acts transitively on its set of root-ball 
pairs. We are primarily concerned with transitive rooted hypergraphs in the 
sequent. 

A covering code for a rooted hypergraph 7i on A is a subset S <Z A with 
the property that, for each a £ A, there exists a 6 £ A with a £ 6. In 
other words, a covering code is a set of vertices so that every vertex of the 
hypergraph belongs to some edge whose root lies in the set. 

Suppose X is a (finite or infinite) set, II is a family of disjoint subsets of 
X n , and 7i is a rooted hypergraph on II. Write IT(x) for the member of II 
containing x. Given a sequence S = (s , . . . , sa/-i) with s, £ X, write 

S- = (Sfcj • • • , Sfc+n-l)? 

and = {Sf n) : 1 < i < M}, where all indices are taken modulo M. We 
call S an order n De Bruijn covering code for II if is a covering code for 
H, and call \S\ = M its length. 

We recover the previous definition of a De Bruijn covering code of radius 
R by taking X = {0, 1}, II the partition into singletons, and 7i the set of 
Hamming i?-balls, i.e., all v = {w : d(v, w) < R} for v £ {0, 1}™. 

For a hypergraph 7i, we write N(tt) for the set of edges containing it, and 
deg(7r) for |iV(7r)|. Finally, we write f(n) -< g{n) to mean that there exists a 
c > so that, for sufficiently large n, f(n) < cg(n), and / ~ g to mean that 
g -< / -< g. Then we have the following theorem. 

Theorem 1. Let (X,H, Ti.) be as above, with X finite, and let Ti be a tran- 
sitive rooted hypergraph. Suppose that \k\ = K{n) -< |II| 2 /n for all it £ II. 
Denote by Tj. the random variable \it\ H ^1; where 7Ti and ir 2 are chosen 
as follows: we pick a string A uniformly at random from X n+k , and set 
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7T! = n(Ai n) ), tt 2 = n(Ag 1 ). JTien, i/ 



^ E(T fc )^/i, 



l<fc<n-l 

there exists a De Bruijn covering code S whose length satisfies 

We delay the proof to Section El One useful corollary is the following 
simplification: take II to be the partition into singletons and the edges of 7i 
to be the -R-balls in the graphical distance metric. The result then follows 
immediately from Theorem^ since K < q n -< q 2n /n trivially. 

Corollary 2. Let G = (V, E) be a transitive graph, with V = X n for some 
set X of cardinality q < oo. Suppose that \B(y;R)\ ~ K{n) for all v £ V. 
Then, if 

Yl \B{x 1 ,...,x n ;R) nB(x k+1 ,...,x k+n ;R)\ ■ q~ n ~ k -< K, 

l<k<n— 1 j;i 1 ...,i^ b 6^ 

there exists a De Bruijn covering code S whose length satisfies 

fL H \s\ -4 y nl °g n 

K K 



2 De Bruijn covering codes for permutations 
and Hamming space 

Our first application generalizes the results of P and j3] to arbitrary (small) 
radii. 

Theorem 3. For R = o(n), and any number of symbols q, there exists a 
De Bruijn covering code of radius R for the q-ary Hamming space ( with the 
ordinary Hamming metric) of dimension n having cardinality -< q n log n/ (^) . 

Proof. We apply Corollary |2] to X = {1, . . . , q}. Clearly, for every v £ X n , 
\B(v; R) \ = 15(0; R)\ = ^f =0 g). It is easy to see that 

£(J)~(3 

fc=0 v 7 v 7 
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when R = o(ri), so we may simply take K = (™) . Now, fix A;, 1 < A; < n — 1, 
and consider the set 

q 

S k = (J B(x 1 ,...,x n ;R)nB(x k +i,...,x k+n ;R). 

We wish to count the elements of S k . They correspond to pairs of strings 
(s,t), s G X n+k and t G X n so that d(s { ™\t) < R and < R- In 

other words, the following family of equations holds: 



s l — t\ — Sk+l 
Sn t n S n -\-k 



except for at most R of the left-hand equalities and at most R of the right- 
hand equalities. For any particular choice of "violated" equalities, we may 
construct all possible solutions to the system by choosing si,...,s k arbi- 
trarily, and then arbitrarily choosing each of the values which follows an 
inequality. All other values are determined by these choices. That is, \S k \ -< 

©V +2i? . 

We therefore have that 



Kk<n-1 Kfc<n-1 



2 / \ 2 

n \ ,„ „ „ _ x I n 



,2R-n 



In order to apply Corollary |21 we need n(j£) -< q n 2R . However, by Stirling's 
Formula, 



n 



where R = en. Since lim e ^ + e £ (1 — cp e ^ = 1, the result follows. □ 

We also have the following implication. Given two sequences of reals 
(ai, . . . , a k ) G M fc and (b\, . . . , b k ) G M. k , we say that they have the same 
order type if Oj < aj iff hi < bj for every 1 < i, j < k. In 0, it is shown 
that there is a sequence of n! reals so that the consecutive n-tuples represent 
every order type. Here we have a similar result. 
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Theorem 4. For any fixed positive integer R, there exists a sequence of reals 
S = {a , • • • , cum-i} so that every permutation a on n symbols differs from 
the order-type of some consecutive n-word in S by at most R transpositions, 
and 

n\ nllogn 
-< M -< 



n 2R n 2R 

Proof. Take X = R, and II the equivalence relation on n-tuples of reals 
in [0, 1] with no repeated elements which represents their order-type (i.e., 
(xi, ...,x n ) ~n (yi, • • • , y n ) iff Xi < xj <^> yi < yj for all For n £ P, n 

is the R-b&\\ rooted at the permutation n under the transposition-distance 
metric. Then 

rr,2R 

K=\n\ = — (l + o(l)) ^\n\yn = n\* /n. 

In order to apply Theorem [H we must show that, if we choose x±, . . . ,x n+ k 
uniformly at random from [0, 1], then, if we denote by 7Ti and the equiva- 
lence classes of (x±, . . . , x n ) and (xk+i, ■ ■ ■ , Xk+ n ), respectively, we have 



Kk<n-1 



Fix k G [n — 1]. If 1 7Ti Pi 7T2 1 is nonempty, then there is some set 5* C [n] 
with \S\ < 4R, so that whenever 7Ti(a:i) < 7Ti(x2) and £i,£2 £ [n] \ S, then 
K2(xi) < 7r 2 (x2). In particular, the event that \tx\ PI 7r 2 | ^ has probability 
at most ( 2 ^) times the probability that two independent, uniformly chosen 
permutations of [k] \ S are identical, since 7Ti and 112 restricted to this set are 
independent. Therefore, for n > 8R, 



8i?<fc<n-l v 7 8R<fc<n-l v ' \ / \ / J 

Now, suppose k < 8R and n > 128i?2 + 1QR. Let Xj = {k(j - 1) + 1, . . . , kj} 
for j = 1, . . . , \n/k\. Clearly, \n/k\ > n/8R — 1. On the other hand, only 
at most 4R of the Xj contain a point of S. Therefore, there is a run of 
consecutive j's of length at least 

n/8R- 1 -k n-8R-64R2 ^ n 



AR 32R2 ~ UR2 
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so that each Xj contains no point of S. This means that the least Xj in each 
interval {xkQ-i)+i, • • • , %kj} f° r j i n this range is a monotone sequence. This 
event has probability 2/(n/64R2)!, whence 

11 2l) - {n/MR2)\\2R) 



l<k<8R 



□ 



3 Proof of the main theorem 

We need a result of Janson to proceed. The following appears in jH]. First, 
some notation. Let I be an index set for a set of events {Sj}j 6 j. Define a 
graph ~ on I with the following property: Let J\ and J 2 be two disjoint 
subsets of I such that there is no i\ G J\ and 22 £ ^2 with ii ~ i 2 . Now, 
let A x be any Boolean function of the events {Bi : i G J\} and let A 2 be 
any Boolean function of the events {Bi : i G J 2 }- Then A\ and A 2 are 
independent. 

Let n = E?=iP(Bi), A = E^P(^ A JBj-), and 5 = max,^P(^')- 
Then the following holds. 

Lemma 5. With the above notation, 

P(/C5)<«P(-n*. (£,§,&)). 

Proof of Theorem^ The number of vertices in is |I1|, so the lower bound 
follows immediately from the fact that N edges cannot cover more than 
0{KN) vertices. 

As for the upper bound, our two-step strategy is as follows: first, we 
take a random set of vertices from the hypergraph; then, we "patch up" the 
string S by appending all the equivalence classes we miss to the end. So, 
take a string S of length M, chosen from the uniform distribution on X M . 
Write ei for the element of II containing 1% , and let u denote a random 
choice of n-word drawn uniformly at random. Denote by Bf the event that 
&i contains 7r G II. Then the probability that a given tt is covered by no edge 
whose root appears among the e\ is given by P(A™ 1 5f). Note that we may 
take Bf ~ Bj iff \i - j\ < n (mod M). 
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We now estimate /x 77 , A 71 ", and 5 n . Since deg(7r) = K for each n e II by 
the transitivity of H, 



i=l i=l i=l 



and 



P(BT A SJ) 

i<l»-j'l<« ( mod M ) 

l<|i-i|<n (mod AT) 



n 

M) 



i Kk<n 



E(T k ) ME(T fc ) 

ini ^ ini ' 



so that 



Also, 



//2 

8A ^ W 



Therefore, 

f fi2 u fi\ MK 
mm l8A'2'65j >> W 
and setting M = C|II| logn/K for an appropriate constant C yields 



If S* is be a random string of length M, this means there are at most |n|/n 
classes in II which do not belong to any of the balls about points of S^ n \ Then 
5" satisfies the conclusion of the theorem, where S' contains two consecutive 
copies of S followed by a concatenated list of one string from each "left out" 
element of II. □ 
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4 Problems and remarks 



The problem which looms largest is, of course, the elimination of the logn 
in the numerator of our upper bounds. There are also a number of related 
questions which the techniques of this paper do not appear to resolve: 

1. In j2], the authors show that a De Bruijn cycle exists for permutations 
of n elements, using only 6n numbers. That is, there is a sequence of 
length exactly n\, consisting of at most 6n reals, so that the set of order- 
types represented by consecutive n-words contains each permutation of 
n exactly once. We may ask something similar for De Bruijn covering 
codes: how long is the shortest De Bruijn covering code of radius R for 
permutations of n elements which uses only, say, Cn symbols? 

Furthermore, in [2], the authors conjectured there is a De Bruijn cycle 
for permutations of n elements using exactly n + 1 numbers. This 
conjecture is still open. 

2. An error correcting code, which can be thought of as dual to covering 
codes, is a subset of [q] n so that no two words in the code are less 
than R symbol-changes apart. What is the longest g-ary sequence 
whose consecutive n-words form an error correcting code of radius Rl 
Clearly, we may ask analogous questions for permutations and other 
rooted hypergraphs. 

3. Given a subset S of the permutations on n symbols and a radius R, 
one may ask, what is the length of the shortest De Bruijn covering code 
for S7 If \S\ — o(n), then choosing the sequence uniformly at random 
may be very suboptimal. For example, one may take S to be the set 
of permutations with at most k descents. 
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